-
Notifications
You must be signed in to change notification settings - Fork 117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding unfollow action in ism to invoke stop replication for ccr #1198
Conversation
Signed-off-by: aggarwalShivani <shivani.aggarwal@nokia.com>
Hi,
Looking forward for the review, Thanks! |
Hi Reviewers, Environment setup details 1. Setup Opensearch
(Above steps can be avoided if you already have opensearch setup)
For ex. bin/opensearch-plugin install file:///home/username/ccr/opensearch-cross-cluster-replication-3.0.0.0-SNAPSHOT.zip 2. Setup Configs
(This is using the default opensearch.yml file and just adding minimal required configs for CCR)
For steps 3-5, I've added the steps in a shell script for convenience - test-unfollow-script.txt. (Changed the extension to .txt as .sh files cannot be uploaded here) I hope that helps : ) |
Signed-off-by: aggarwalShivani <shivani.aggarwal@nokia.com>
it, | ||
) | ||
}*/ | ||
val response = performStopAction(context.client as NodeClient, stopIndexReplicationRequestObj) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I read the CCR doc and see if we stop relication, we won't be able to resume anymore.
In ISM, probably the normal workflow is after the leader index rollover, we can then stop the replication
I'm wondering how do we know that so to prevent early stopping the replication.
not requiring this since probably we can just mention this caveat in the documentation, and give a long waiting time before stop relication in the follower cluster
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, replication cannot be resumed on an index after stopping (or pausing more than 12h).
Hmm, the major use-case we had identified in ISM+CCR case was - say ISM is setup in both leader and follower clusters, for deletion and other housekeeping operations in the respective clusters.
In the follower cluster, even if the user has setup a policy and intends to delete some-pattern* indices, it would not be allowed as they would be still read-only due to ongoing replication, which needs to be stopped first.
So in such cases, users could chain the actions to be preceeded by stop-replication first (before any other write actions).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood.
I'm just a little worried about one part: the follower index cannot know whether leader index finishes indexing, so it may stop relication early unexpectedly
but this can be waited for community feedback whether it's needed
src/main/kotlin/org/opensearch/indexmanagement/indexstatemanagement/action/UnfollowAction.kt
Outdated
Show resolved
Hide resolved
/*val response: AcknowledgedResponse = | ||
ReplicationPluginInterface.suspendUntil { | ||
this.stopReplication( | ||
context.client as NodeClient, | ||
stopIndexReplicationRequestObj, | ||
it, | ||
) | ||
}*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure 👍
I had added this to get some suggestions on writing mock-uts for AttemptUnfollowStep.kt as I have tried with two different approaches using a function performStopAction() and directly invoking ReplicationPluginInterface.suspendUntil, but to no success.
Do you have any suggestions on either partially mocking AttempUnfollowStep or mocking ReplicationPluginInterface.stopReplication in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering did you try mock the client behavior like this, I suppose it would be pretty similar since both return acknowledge response
Lines 172 to 205 in 4d8ef69
private fun getIndicesAdminClient( | |
rolloverResponse: RolloverResponse?, | |
aliasResponse: AcknowledgedResponse?, | |
rolloverException: Exception?, | |
aliasException: Exception?, | |
): IndicesAdminClient { | |
assertTrue( | |
"Must provide one and only one response or exception", | |
(rolloverResponse != null).xor(rolloverException != null), | |
) | |
assertTrue( | |
"Must provide one and only one response or exception", | |
(aliasResponse != null).xor(aliasException != null), | |
) | |
return mock { | |
doAnswer { invocationOnMock -> | |
val listener = invocationOnMock.getArgument<ActionListener<AcknowledgedResponse>>(1) | |
if (rolloverResponse != null) { | |
listener.onResponse(rolloverResponse) | |
} else { | |
listener.onFailure(rolloverException) | |
} | |
}.whenever(this.mock).rolloverIndex(any(), any()) | |
doAnswer { invocationOnMock -> | |
val listener = invocationOnMock.getArgument<ActionListener<AcknowledgedResponse>>(1) | |
if (aliasResponse != null) { | |
listener.onResponse(aliasResponse) | |
} else { | |
listener.onFailure(aliasException) | |
} | |
}.whenever(this.mock).aliases(any(), any()) | |
} | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey,
Yes I did try this. But there is a slight difference in these two classes, which is the trouble.
AttemptRolloverStep executes the rollover like this
val response: RolloverResponse = context.client.admin().indices().suspendUntil { rolloverIndex(request, it) }
In AttemptRolloverStepTests, context.client has been mocked in a way that when the rolloverIndex() is invoked by the test, it would return the value as directed by the mocked function.
But in the AttemptUnfollowStep.kt class, the execute() method invokes it like this
val response: AcknowledgedResponse =
ReplicationPluginInterface.suspendUntil {
this.stopReplication(
context.client as NodeClient,
stopIndexReplicationRequestObj,
it,
)
}
Here, I can mock the context.client and other params, but the test needs to mock ReplicationPluginInterface and ReplicationPluginInterface.suspendUntil to ultimately mock the response of stopReplication() function. If that does not happen, it would run the actual implementation and not the mocked one - which does not work in the UT.
I'm unable to mock ReplicationPluginInterface as it is static. Tried another approach using mock spy (to mock AttemptUnfollowStep.performStopAction() partially), and that hasn't worked either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @bowenlan-amzn , Any thoughts here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aggarwalShivani it's published from this action
https://github.com/opensearch-project/cross-cluster-replication/blob/e4f2143f294c32936f9d3652f3db4a7eaefd274a/.github/workflows/maven-publish.yml
would you mind just raise a PR in ccr and update this file to be same as the ISM one
We can see this action never run in ccr https://github.com/opensearch-project/cross-cluster-replication/actions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion @bowenlan-amzn.
Noticed that the github action file is there in 2.x branch but not in main branch of CCR.
For main, I have created a PR in CCR for this
opensearch-project/cross-cluster-replication#1509. Should see if this helps resolving the issue.
Hi all
|
Have reviewed all the PRs, and setup my local run on 2.19.1 and able to succeed. |
Signed-off-by: aggarwalShivani <39588384+aggarwalShivani@users.noreply.github.com>
Signed-off-by: aggarwalShivani <shivani.aggarwal@nokia.com>
Signed-off-by: aggarwalShivani <shivani.aggarwal@nokia.com>
Signed-off-by: aggarwalShivani <shivani.aggarwal@nokia.com>
Thank You for the review and valuable feedback throughout Pls note: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much!
Issue #, if available: #726
This is to add support for unfollow feature in ism.
It depends on two PRs already raised and under-review in common-utils project as well as CCR project. Detailed information about the proposed solution is explained there.
Description of changes:
CheckList:
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.